Intro to ValidMind

ValidMind Python Library Introduction

%load_ext dotenv
%dotenv .env

import pandas as pd
import xgboost as xgb

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

%matplotlib inline
The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv

Initializing the ValidMind Library

After creating an account with ValidMind, we can find the project’s API key and secret in the settings page of the ValidMind dashboard.

The library credentials can be configured in two ways:

  • By setting the VM_API_KEY and VM_API_SECRET environment variables or
  • By passing api_key and api_secret arguments to the init function like this:
vm.init(
    api_key='<your-api-key>',
    api_secret='<your-api-secret>',
    project="cl2r3k1ri000009jweny7ba1g"
)

The project argument is mandatory since it allows the library to associate all data collected with a specific account project.

import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  project = "clhhz04x40000wcy6shay2oco"
)
Connected to ValidMind. Project: Customer Churn Model - Initial Validation (clhhz04x40000wcy6shay2oco)

Using a demo dataset

For this simple demonstration, we will use the following bank customer churn dataset from Kaggle: https://www.kaggle.com/code/kmalit/bank-customer-churn-prediction/data.

We will train a sample model and demonstrate the following library functionalities:

  • Logging information about a dataset
  • Running data quality tests on a dataset
  • Logging information about a model
  • Logging training metrics for a model
  • Running model evaluation tests

Running a data quality test plan

We will now run the default data quality test plan that will collect the following metadata from a dataset:

  • Field types and descriptions
  • Descriptive statistics
  • Data distribution histograms
  • Feature correlations

and will run a collection of data quality tests such as:

  • Class imbalance
  • Duplicates
  • High cardinality
  • Missing values
  • Skewness

ValidMind evaluates if the data quality metrics are within expected ranges. These thresholds or ranges can be further configured by model validators.

Load our demo dataset

Before running the test plan, we must first load the dataset into a Pandas DataFrame and initialize a ValidMind dataset object:

df = pd.read_csv("./datasets/bank_customer_churn.csv")

vm_dataset = vm.init_dataset(
    dataset=df,
    target_column="Exited",
    class_labels={
        "0": "Did not exit",
        "1": "Exited",
    }
)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

Initialize and run the TabularDataset test plan

We can now initialize the TabularDataset test plan. The primary method of doing this is with the run_test_plan function from the vm module. This function takes in a test plan name (in this case tabular_dataset) and a dataset keyword argument (the vm_dataset object we created earlier):

vm.run_test_plan("tabular_dataset", dataset=vm_dataset)
tabular_plan = vm.run_test_plan("tabular_dataset", dataset=vm_dataset)
                                                                                                                                        

Results for Tabular Dataset Description Test Plan:


Test plan to extract metadata and descriptive statistics from a tabular dataset

Logged the following dataset to the ValidMind platform:

RowNumber CustomerId CreditScore Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
count 8000.000000 8.000000e+03 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000 8000.000000
mean 5020.520000 1.569047e+07 650.159625 38.948875 5.033875 76434.096511 1.532500 0.702625 0.519875 99790.187959 0.202000
std 2885.718516 7.190247e+04 96.846230 10.458952 2.885267 62612.251258 0.580505 0.457132 0.499636 57520.508892 0.401517
min 1.000000 1.556570e+07 350.000000 18.000000 0.000000 0.000000 1.000000 0.000000 0.000000 11.580000 0.000000
25% 2518.750000 1.562816e+07 583.000000 32.000000 3.000000 0.000000 1.000000 0.000000 0.000000 50857.102500 0.000000
50% 5036.500000 1.569014e+07 651.500000 37.000000 5.000000 97263.675000 1.000000 1.000000 1.000000 99504.890000 0.000000
75% 7512.250000 1.575238e+07 717.000000 44.000000 8.000000 128044.507500 2.000000 1.000000 1.000000 149216.320000 0.000000
max 10000.000000 1.581566e+07 850.000000 92.000000 10.000000 250898.090000 4.000000 1.000000 1.000000 199992.480000 1.000000

Logged the following dataset metric to the ValidMind platform:

Metric Name
descriptive_statistics
Metric Type
dataset
Metric Scope
Metric Value
{'numerical': [{'Name': 'RowNumber', 'Count': 8000.0, 'Mean': 5020.52, 'Std': 2885.7185155986554, 'Min': 1.0, '25%': 2518.75, '50%': 5036.5, '75%': 7512.25, '90%': 9015.1, '95%': 9516.05, 'Max': 10000.0}, {'Name': 'CustomerId', 'Count': 8000.0, 'Mean': 15690474.465625, 'Std': 71902.473335347, 'Min': 15565701.0, '25%': 15628163.75, '50%': 15690143.5, '75%': 15752378.25, '90%': 15790809.1, '95%': 15802760.55, 'Max': 15815660.0}, {'Name': 'CreditScore', 'Count': 8000.0, 'Mean': 650.159625, 'Std': 96.84623014808636, 'Min': 350.0, '25%': 583.0, '50%': 651.5, '75%': 717.0, '90%': 778.0, '95%': 813.0, 'Max': 850.0}, {'Name': 'Age', 'Count': 8000.0, 'Mean': 38.948875, 'Std': 10.458952382767269, 'Min': 18.0, '25%': 32.0, '50%': 37.0, '75%': 44.0, '90%': 53.0, '95%': 60.0, 'Max': 92.0}, {'Name': 'Tenure', 'Count': 8000.0, 'Mean': 5.033875, 'Std': 2.885267419215253, 'Min': 0.0, '25%': 3.0, '50%': 5.0, '75%': 8.0, '90%': 9.0, '95%': 9.0, 'Max': 10.0}, {'Name': 'Balance', 'Count': 8000.0, 'Mean': 76434.09651125, 'Std': 62...

Logged the following dataset metric to the ValidMind platform:

Metric Name
dataset_correlations
Metric Type
dataset
Metric Scope
training
Metric Value
[[{'field': 'CreditScore', 'value': 1.0}, {'field': 'Geography', 'value': 0.010103440458197478}, {'field': 'Gender', 'value': 0.008251776778083898}, {'field': 'Age', 'value': -0.007269780957496768}, {'field': 'Tenure', 'value': -0.006914675142663373}, {'field': 'NumOfProducts', 'value': 0.005677094521946256}, {'field': 'HasCrCard', 'value': -0.009291152528707963}, {'field': 'IsActiveMember', 'value': 0.030554141043824444}, {'field': 'Exited', 'value': -0.025533166369817405}], [{'field': 'CreditScore', 'value': 0.010103440458197478}, {'field': 'Geography', 'value': 1.0}, {'field': 'Gender', 'value': 0.035023152881466464}, {'field': 'Age', 'value': 0.053602289473512775}, {'field': 'Tenure', 'value': 0.015510338111733172}, {'field': 'NumOfProducts', 'value': 0.011118424429054087}, {'field': 'HasCrCard', 'value': 0.021747611293409512}, {'field': 'IsActiveMember', 'value': 0.02017951122934769}, {'field': 'Exited', 'value': 0.1784101181767361}], [{'field': 'CreditScore', 'value': 0.008251776778083898}, {'field': 'G...
Metric Plots

Results for Tabular Data Quality Test Plan:


Test plan for data quality on tabular datasets

Logged the following test result to the ValidMind platform:

Class Imbalance
Test Name
class_imbalance
Category
data_quality
Passed
True
Params
{'min_percent_threshold': 0.2}

Logged the following test result to the ValidMind platform:

Duplicates
Test Name
duplicates
Category
data_quality
Passed
True
Params
{'min_threshold': 1}

Logged the following test result to the ValidMind platform:

Cardinality
Test Name
cardinality
Category
data_quality
Passed
False
Params
{'num_threshold': 100, 'percent_threshold': 0.1, 'threshold_type': 'percent'}

Logged the following test result to the ValidMind platform:

Pearson Correlation
Test Name
pearson_correlation
Category
data_quality
Passed
False
Params
{'max_threshold': 0.3}

Logged the following test result to the ValidMind platform:

Missing
Test Name
missing
Category
data_quality
Passed
True
Params
{'min_threshold': 1}

Logged the following test result to the ValidMind platform:

Skewness
Test Name
skewness
Category
data_quality
Passed
False
Params
{'max_threshold': 1}

Logged the following test result to the ValidMind platform:

Unique
Test Name
unique
Category
data_quality
Passed
False
Params
{'min_percent_threshold': 1}

Logged the following test result to the ValidMind platform:

Zeros
Test Name
zeros
Category
data_quality
Passed
False
Params
{'max_percent_threshold': 0.03}

Finding all test plans available in the developer framework

We can find all the test plans available in the developer framework by calling the following functions:

  • All test plans: vm.test_plans.list_plans()
  • Describe a test plan: vm.test_plans.describe_plan("tabular_dataset")
  • List all available tests: vm.test_plans.list_tests()

As an example, here’s the outpout list_plans() and list_tests():

vm.test_plans.list_plans()
ID Name Description
sklearn_classifier_metrics SKLearnClassifierMetrics Test plan for sklearn classifier metrics
sklearn_classifier_validation SKLearnClassifierPerformanceTest plan for sklearn classifier models
sklearn_classifier_model_diagnosisSKLearnClassifierDiagnosis Test plan for sklearn classifier model diagnosis tests
sklearn_classifier SKLearnClassifier Test plan for sklearn classifier models that includes both metrics and validation tests
tabular_dataset TabularDataset Test plan for generic tabular datasets
tabular_dataset_description TabularDatasetDescription Test plan to extract metadata and descriptive statistics from a tabular dataset
tabular_data_quality TabularDataQuality Test plan for data quality on tabular datasets
normality_test_plan NormalityTestPlan Test plan to perform normality tests.
autocorrelation_test_plan AutocorrelationTestPlan Test plan to perform autocorrelation tests.
seasonality_test_plan SesonalityTestPlan Test plan to perform seasonality tests.
unit_root UnitRoot Test plan to perform unit root tests.
stationarity_test_plan StationarityTestPlan Test plan to perform stationarity tests.
timeseries TimeSeries Test plan for time series statsmodels that includes both metrics and validation tests
time_series_data_quality TimeSeriesDataQuality Test plan for data quality on time series datasets
time_series_dataset TimeSeriesDataset Test plan for time series datasets
time_series_univariate TimeSeriesUnivariate Test plan to perform time series univariate analysis.
time_series_multivariate TimeSeriesMultivariate Test plan to perform time series multivariate analysis.
time_series_forecast TimeSeriesForecast Test plan to perform time series forecast tests.
regression_model_performance RegressionModelPerformance Test plan for statsmodels regressor models that includes both metrics and validation tests
vm.test_plans.list_tests()
Test Type ID Name Description
Custom Test dataset_metadata DatasetMetadata Custom class to collect a set of descriptive statistics for a dataset. This class will log dataset metadata via `log_dataset` instead of a metric. Dataset metadat is necessary to initialize dataset object that can be related to different metrics and test results
Custom Test shap SHAPGlobalImportance SHAP Global Importance. Custom metric
Metric acf_pacf_plot ACFandPACFPlot Plots ACF and PACF for a given time series dataset.
Metric adf ADF Augmented Dickey-Fuller unit root test for establishing the order of integration of time series
Metric accuracy AccuracyScore Accuracy Score
Metric box_pierce BoxPierce The Box-Pierce test is a statistical test used to determine whether a given set of data has autocorrelations that are different from zero.
Metric csi CharacteristicStabilityIndexCharacteristic Stability Index between two datasets
Metric confusion_matrix ConfusionMatrix Confusion Matrix
Metric dickey_fuller_gls DFGLSArch Dickey-Fuller GLS unit root test for establishing the order of integration of time series
Metric dataset_correlations DatasetCorrelations Extracts the correlation matrix for a dataset. The following coefficients are calculated: - Pearson's R for numerical variables - Cramer's V for categorical variables - Correlation ratios for categorical-numerical variables
Metric dataset_description DatasetDescription Collects a set of descriptive statistics for a dataset
Metric dataset_split DatasetSplit Attempts to extract information about the dataset split from the provided training, test and validation datasets.
Metric descriptive_statistics DescriptiveStatistics Collects a set of descriptive statistics for a dataset, both for numerical and categorical variables
Metric f1_score F1Score F1 Score
Metric jarque_bera JarqueBera The Jarque-Bera test is a statistical test used to determine whether a given set of data follows a normal distribution.
Metric kpss KPSS Kwiatkowski-Phillips-Schmidt-Shin (KPSS) unit root test for establishing the order of integration of time series
Metric kolmogorov_smirnov KolmogorovSmirnov The Kolmogorov-Smirnov metric is a statistical test used to determine whether a given set of data follows a normal distribution.
Metric ljung_box LJungBox The Ljung-Box test is a statistical test used to determine whether a given set of data has autocorrelations that are different from zero.
Metric lagged_correlation_heatmap LaggedCorrelationHeatmap Generates a heatmap of correlations between the target variable and the lags of independent variables in the dataset.
Metric lilliefors_test Lilliefors The Lilliefors test is a statistical test used to determine whether a given set of data follows a normal distribution.
Metric model_metadata ModelMetadata Custom class to collect the following metadata for a model: - Model architecture - Model hyperparameters - Model task type
Metric model_prediction_ols ModelPredictionOLS Calculates and plots the model predictions for each of the models
Metric pfi PermutationFeatureImportancePermutation Feature Importance
Metric phillips_perron PhillipsPerronArch Phillips-Perron (PP) unit root test for establishing the order of integration of time series
Metric psi PopulationStabilityIndex Population Stability Index between two datasets
Metric pr_curve PrecisionRecallCurve Precision Recall Curve
Metric precision PrecisionScore Precision Score
Metric roc_auc ROCAUCScore ROC AUC Score
Metric roc_curve ROCCurve ROC Curve
Metric recall RecallScore Recall Score
Metric RegressionModelSummary Test that output the summary of regression models of statsmodel library.
Metric residuals_visual_inspectionResidualsVisualInspection Log plots for visual inspection of residuals
Metric rolling_stats_plot RollingStatsPlot This class provides a metric to visualize the stationarity of a given time series dataset by plotting the rolling mean and rolling standard deviation. The rolling mean represents the average of the time series data over a fixed-size sliding window, which helps in identifying trends in the data. The rolling standard deviation measures the variability of the data within the sliding window, showing any changes in volatility over time. By analyzing these plots, users can gain insights into the stationarity of the time series data and determine if any transformations or differencing operations are required before applying time series models.
Metric runs_test RunsTest The runs test is a statistical test used to determine whether a given set of data has runs of positive and negative values that are longer than expected under the null hypothesis of randomness.
Metric scatter_plot ScatterPlot Generates a visual analysis of data by plotting a scatter plot matrix for all columns in the dataset. The input dataset can have multiple columns (features) if necessary.
Metric shapiro_wilk ShapiroWilk The Shapiro-Wilk test is a statistical test used to determine whether a given set of data follows a normal distribution.
Metric spread_plot SpreadPlot This class provides a metric to visualize the spread between pairs of time series variables in a given dataset. By plotting the spread of each pair of variables in separate figures, users can assess the relationship between the variables and determine if any cointegration or other time series relationships exist between them.
Metric time_series_histogram TimeSeriesHistogram Generates a visual analysis of time series data by plotting the histogram. The input dataset can have multiple time series if necessary. In this case we produce a separate plot for each time series.
Metric time_series_line_plot TimeSeriesLinePlot Generates a visual analysis of time series data by plotting the raw time series. The input dataset can have multiple time series if necessary. In this case we produce a separate plot for each time series.
Metric zivot_andrews ZivotAndrewsArch Zivot-Andrews unit root test for establishing the order of integration of time series
ThresholdTestclass_imbalance ClassImbalance The class imbalance test measures the disparity between the majority class and the minority class in the target column.
ThresholdTestduplicates Duplicates The duplicates test measures the number of duplicate rows found in the dataset. If a primary key column is specified, the dataset is checked for duplicate primary keys as well.
ThresholdTestcardinality HighCardinality The high cardinality test measures the number of unique values found in categorical columns.
ThresholdTestpearson_correlation HighPearsonCorrelation Test that the pairwise Pearson correlation coefficients between the features in the dataset do not exceed a specified threshold.
ThresholdTestaccuracy_score MinimumAccuracy Test that the model's prediction accuracy on a dataset meets or exceeds a predefined threshold.
ThresholdTestf1_score MinimumF1Score Test that the model's F1 score on the validation dataset meets or exceeds a predefined threshold.
ThresholdTestroc_auc_score MinimumROCAUCScore Test that the model's ROC AUC score on the validation dataset meets or exceeds a predefined threshold.
ThresholdTestmissing MissingValues Test that the number of missing values in the dataset across all features is less than a threshold
ThresholdTestoverfit_regions OverfitDiagnosis Test that identify overfit regions with high residuals by histogram slicing techniques.
ThresholdTestrobustness RobustnessDiagnosis Test robustness of model by perturbing the features column values
ThresholdTestskewness Skewness The skewness test measures the extent to which a distribution of values differs from a normal distribution. A positive skew describes a longer tail of values in the right and a negative skew describes a longer tail of values in the left.
ThresholdTesttime_series_frequency TimeSeriesFrequency Test that detect frequencies in the data
ThresholdTesttime_series_missing_values TimeSeriesMissingValues Test that the number of missing values is less than a threshold
ThresholdTesttime_series_outliers TimeSeriesOutliers Test that find outliers for time series data using the z-score method
ThresholdTestzeros TooManyZeroValues The zeros test finds columns that have too many zero values.
ThresholdTesttraining_test_degradation TrainingTestDegradation Test that the degradation in performance between the training and test datasets does not exceed a predefined threshold.
ThresholdTestunique UniqueRows Test that the number of unique rows is greater than a threshold
ThresholdTestweak_spots WeakspotsDiagnosis Test that identify weak regions with high residuals by histogram slicing techniques.

Preparing the dataset for training

Before we train a model, we need to run some common minimal feature selection and engineering steps on the dataset:

  • Dropping irrelevant variables
  • Encoding categorical variables

Dropping irrelevant variables

The following variables will be dropped from the dataset:

  • RowNumber: it’s a unique identifier to the record
  • CustomerId: it’s a unique identifier to the customer
  • Surname: no predictive power for this variable
  • CreditScore: we didn’t observer any correlation between CreditScore and our target column Exited
df.drop(["RowNumber", "CustomerId", "Surname", "CreditScore"], axis=1, inplace=True)

Encoding categorical variables

We will apply one-hot or dummy encoding to the following variables:

  • Geography: only 3 unique values found in the dataset
  • Gender: convert from string to integer
genders = {"Male": 0, "Female": 1}
df.replace({"Gender": genders}, inplace=True)
df = pd.concat([df, pd.get_dummies(df["Geography"], prefix="Geography")], axis=1)
df.drop("Geography", axis=1, inplace=True)

We are now ready to train our model with the preprocessed dataset:

df.head()
Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited Geography_France Geography_Germany Geography_Spain
0 1 42 2 0.00 1 1 1 101348.88 1 1 0 0
1 1 41 1 83807.86 1 0 1 112542.58 0 0 0 1
2 1 42 8 159660.80 3 1 0 113931.57 1 1 0 0
3 1 39 1 0.00 2 0 0 93826.63 0 1 0 0
4 1 43 2 125510.82 1 1 1 79084.10 0 0 0 1

Dataset preparation

For training our model, we will randomly split the dataset in 3 parts:

  • training split with 60% of the rows
  • validation split with 20% of the rows
  • test split with 20% of the rows

The test dataset will be our held out dataset for model evaluation.

train_df, test_df = train_test_split(df, test_size=0.20)

# This guarantees a 60/20/20 split
train_ds, val_ds = train_test_split(train_df, test_size=0.25)

# For training
x_train = train_ds.drop("Exited", axis=1)
y_train = train_ds.loc[:, "Exited"].astype(int)
x_val = val_ds.drop("Exited", axis=1)
y_val = val_ds.loc[:, "Exited"].astype(int)

# For testing
x_test = test_df.drop("Exited", axis=1)
y_test = test_df.loc[:, "Exited"].astype(int)

Model training

We will train a simple XGBoost model and set its eval_set to [(x_train, y_train), (x_val, y_val)] in order to collect validation datasets metrics on every round. The ValidMind library supports collecting any type of “in training” metrics so model developers can provide additional context to model validators if necessary.

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_train, y_train), (x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
y_pred = model.predict_proba(x_val)[:, -1]
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_val, predictions)

print(f"Accuracy: {accuracy}")
Accuracy: 0.858125

Running a model evaluation test plan

We will now run a basic model evaluation test plan that is compatible with the model we have trained. Since we have trained an XGBoost model with a sklearn-like API, we will use the SKLearnClassifier test plan. This test plan will collect model metadata and metrics, and run a variety of model evaluation tests, according to the modeling objective (binary classification for this example).

The following model metadata is collected:

  • Model framework and architecture (e.g. XGBoost, Random Forest, Logistic Regression, etc.)
  • Model task details (e.g. binary classification, regression, etc.)
  • Model hyperparameters (e.g. number of trees, max depth, etc.)

The model metrics that are collected depend on the model type, use case, etc. For example, for a binary classification model, the following metrics could be collected (again, depending on configuration):

  • AUC
  • Error rate
  • Logloss
  • Feature importance

Similarly, different model evaluation tests are run depending on the model type, use case, etc. For example, for a binary classification model, the following tests could be executed:

  • Simple training/test overfit test
  • Training/test performance degradation
  • Baseline test dataset performance test

Initialize VM model object and train/test datasets

In order to run our SKLearnClassifier test plan, we need to initialize ValidMind object instances for the trained model and the training and test datasets:

vm_model = vm.init_model(model)
vm_train_ds = vm.init_dataset(dataset=train_ds, type="generic", target_column="Exited")
vm_test_ds = vm.init_dataset(dataset=test_df, type="generic", target_column="Exited")
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

We can now run the SKLearnClassifier test plan:

model_plan = vm.run_test_plan("sklearn_classifier", model=vm_model, train_ds=vm_train_ds, test_ds=vm_test_ds)
                                                                                                                                   

Results for Sklearn Classifier Metrics Test Plan:


Test plan for sklearn classifier metrics

Logged the following model metric to the ValidMind platform:

Metric Name
model_metadata
Metric Type
model
Metric Scope
test
Metric Value
{'architecture': 'Extreme Gradient Boosting', 'task': 'classification', 'subtask': 'binary', 'framework': 'XGBoost', 'framework_version': '1.7.5', 'params': {'objective': 'binary:logistic', 'base_score': None, 'booster': None, 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': None, 'eval_metric': ['error', 'logloss', 'auc'], 'gamma': None, 'gpu_id': None, 'grow_policy': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, 'max_leaves': None, 'min_child_weight': None, 'monotone_constraints': None, 'n_jobs': None, 'num_parallel_tree': None, 'predictor': None, 'random_state': None, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None}}

Logged the following dataset metric to the ValidMind platform:

Metric Name
dataset_split
Metric Type
dataset
Metric Scope
Metric Value
{'train_ds_size': 4800, 'train_ds_proportion': 0.75, 'test_ds_size': 1600, 'test_ds_proportion': 0.25, 'total_size': 6400}

Logged the following evaluation metric to the ValidMind platform:

Metric Name
accuracy
Metric Type
evaluation
Metric Scope
test
Metric Value
0.85375

Logged the following evaluation metric to the ValidMind platform:

Metric Name
confusion_matrix
Metric Type
evaluation
Metric Scope
test
Metric Value
{'tn': 1220, 'fp': 46, 'fn': 188, 'tp': 146}
Metric Plots

Logged the following evaluation metric to the ValidMind platform:

Metric Name
f1_score
Metric Type
evaluation
Metric Scope
test
Metric Value
0.5551330798479087

Logged the following training metric to the ValidMind platform:

Metric Name
pfi
Metric Type
training
Metric Scope
training_dataset
Metric Value
{'Gender': ([0.005500000000000016], [0.001168153814072993]), 'Age': ([0.09354166666666668], [0.004370036867375639]), 'Tenure': ([0.004833333333333356], [0.00033333333333333826]), 'Balance': ([0.022333333333333316], [0.0020258742968571877]), 'NumOfProducts': ([0.06670833333333333], [0.0018929694486000744]), 'HasCrCard': ([0.0016666666666666607], [0.00029462782549439376]), 'IsActiveMember': ([0.03441666666666667], [0.0014813657362192476]), 'EstimatedSalary': ([0.009208333333333329], [0.0014754942674688038]), 'Geography_France': ([0.0007083333333333331], [0.0005368374469469038]), 'Geography_Germany': ([0.014666666666666672], [0.0017410485346479926]), 'Geography_Spain': ([0.0001666666666666705], [0.0004639803635691593])}
Metric Plots

Logged the following evaluation metric to the ValidMind platform:

Metric Name
pr_curve
Metric Type
evaluation
Metric Scope
test
Metric Value
{'precision': array([0.20875   , 0.20901126, 0.2130102 , ..., 1.        , 1.        ,
       1.        ]), 'recall': array([1.        , 1.        , 1.        , ..., 0.00598802, 0.00299401,
       0.        ]), 'thresholds': array([0.022918  , 0.02306371, 0.0250723 , ..., 0.96385795, 0.96962535,
       0.9773121 ], dtype=float32)}
Metric Plots

Logged the following evaluation metric to the ValidMind platform:

Metric Name
precision
Metric Type
evaluation
Metric Scope
test
Metric Value
0.7604166666666666

Logged the following evaluation metric to the ValidMind platform:

Metric Name
recall
Metric Type
evaluation
Metric Scope
test
Metric Value
0.437125748502994

Logged the following evaluation metric to the ValidMind platform:

Metric Name
roc_auc
Metric Type
evaluation
Metric Scope
test
Metric Value
0.7003954176954149

Logged the following evaluation metric to the ValidMind platform:

Metric Name
roc_curve
Metric Type
evaluation
Metric Scope
test
Metric Value
{'auc': 0.7003954176954149, 'fpr': array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 7.89889415e-04,
       7.89889415e-04, 7.89889415e-04, 7.89889415e-04, 7.89889415e-04,
       7.89889415e-04, 1.57977883e-03, 1.57977883e-03, 1.57977883e-03,
       1.57977883e-03, 1.57977883e-03, 1.57977883e-03, 2.36966825e-03,
       2.36966825e-03, 2.36966825e-03, 2.36966825e-03, 3.15955766e-03,
       3.15955766e-03, 3.94944708e-03, 3.94944708e-03, 4.73933649e-03,
       4.73933649e-03, 5.52922591e-03, 5.52922591e-03, 6.31911532e-03,
       6.31911532e-03, 6.31911532e-03, 6.31911532e-03, 1.02685624e-02,
       1.02685624e-02, 1.10584518e-02, 1.10584518e-02, 1.26382306e-02,
       1.26382306e-02, 1.34281201e-02, 1.34281201e-02, 1.42180095e-02,
       1.42180095e-02, 1.50078989e-02, 1.50078989e-02, 1.57977883e-02,
       1.57977883e-02, 1.65876777e-02, 1.65876777e-02, 1.73775671e-02,
       1.73775671e-02, 1.81674566e-02, 1.81674566e-02, 1.97472354e-02,
       1.97472354e-02, 2.05371248e-02, 2.05371248e-02, 2.13270142e...
Metric Plots

Logged the following training metric to the ValidMind platform:

Metric Name
csi
Metric Type
training
Metric Scope
Metric Value
{'Gender': 3.7e-05, 'Age': 0.000377, 'Tenure': 0.000354, 'Balance': 0.000943, 'NumOfProducts': 0.000261, 'HasCrCard': 9.3e-05, 'IsActiveMember': 0.0, 'EstimatedSalary': 0.000372, 'Geography_France': 9e-06, 'Geography_Germany': 1e-05, 'Geography_Spain': 0.0}

Logged the following training metric to the ValidMind platform:

Metric Name
psi
Metric Type
training
Metric Scope
Metric Value
     initial  percent_initial  new  percent_new       psi
bin                                                      
1       2584          0.53833  855     0.534375  0.000029
2        799          0.16646  255     0.159375  0.000308
3        437          0.09104  149     0.093125  0.000047
4        236          0.04917   91     0.056875  0.001123
5        150          0.03125   58     0.036250  0.000742
6        113          0.02354   42     0.026250  0.000295
7        119          0.02479   37     0.023125  0.000116
8         90          0.01875   25     0.015625  0.000570
9        126          0.02625   37     0.023125  0.000396
10       146          0.03042   51     0.031875  0.000068

Logged the following plots to the ValidMind platform:

Metric Plots

Results for Sklearn Classifier Validation Test Plan:


Test plan for sklearn classifier models

Logged the following test result to the ValidMind platform:

Accuracy Score
Test Name
accuracy_score
Category
model_performance
Passed
True
Params
{'min_threshold': 0.7}

Logged the following test result to the ValidMind platform:

F1 Score
Test Name
f1_score
Category
model_performance
Passed
False
Params
{'min_threshold': 0.5}

Logged the following test result to the ValidMind platform:

Roc Auc Score
Test Name
roc_auc_score
Category
model_performance
Passed
False
Params
{'min_threshold': 0.5}

Logged the following test result to the ValidMind platform:

Training Test Degradation
Test Name
training_test_degradation
Category
model_performance
Passed
False
Params
{'metrics': ['accuracy', 'precision', 'recall', 'f1']}

Results for Sklearn Classifier Model Diagnosis Test Plan:


Test plan for sklearn classifier model diagnosis tests

Logged the following test result to the ValidMind platform:

Overfit Regions
Test Name
overfit_regions
Category
model_diagnosis
Passed
False
Params
{'features_columns': None, 'cut_off_percentage': 4}
Metric Plots

Logged the following test result to the ValidMind platform:

Weak Spots
Test Name
weak_spots
Category
model_diagnosis
Passed
False
Params
{'features_columns': None, 'thresholds': {'accuracy': 0.75, 'precision': 0.5, 'recall': 0.5, 'f1': 0.7}}
Metric Plots

Logged the following test result to the ValidMind platform:

Robustness
Test Name
robustness
Category
model_diagnosis
Passed
True
Params
{'features_columns': None, 'scaling_factor_std_dev_list': [0.01, 0.02]}
Metric Plots